Skip to content

Conversation

@lihaoyi
Copy link
Contributor

@lihaoyi lihaoyi commented Oct 15, 2025

Implements scala/improvement-proposals#112

  • The initial Lexing/Scanning is lenient, only looking for the opening ''''* and equivalent closing delimiter. This matches how we can expect this to be implemented in other tools that have more restricted lexing frameworks (IntelliJ w/ JFlex, VSCode w/ TextMate Grammars, NeoVim w/ TreeSitter)

  • All other validation (opening delimiter must be followed by newline, closing delimiter must be preceded by whitespace only) and de-denting is left to the parsing phase, which is the only time we have a complete string "literal" when an interpolator is present, and thus are able to look at the trailing delimiter's preceding indent whitespace and trim it from all earlier STRINGLIT/STRINGPART tokens

  • def interpolatedString needed to be refactored to support dedenting: rather than constructing the trees immediately, we first assemble all the strings parts, then use the last string part to compute the dedent that we apply to all other parts, and only then do we construct the trees

  • Covered by neg/ tests and run/ tests for all the major features and edge cases I could think of:

    • All indentation removed
    • Some indentation preserved
    • Empty strings
    • Single-line strings
    • Blank lines in the string
    • Leading and trailing blank lines
    • Varying indentation
    • Extensible delimiters with 4 and 5 quotes
    • Funky operator and unicode characters in the string
    • Tab-based indentation
    • Interpolation with s and f
    • Single- and Multi-line pattern matching with and without interpolation
    • In larger expressions: lists, infix operators, etc.
    • As singleton-type ascriptions and singleton-type parameters
    • As literals passed to @compileTimeOnly
  • I haven't managed to reliably run tests for some reason, I think I'm bumping into https://contributors.scala-lang.org/t/current-testcompilation/7256. But I tested it manually by copy-pasting the run/neg test files into the bin/scala REPL and compared the output manually with the .check files on disk, and the output is identical

@Gedochao Gedochao requested review from odersky and sjrd October 15, 2025 08:33
@Gedochao Gedochao changed the title WIP dedented triple-quoted string literals SIP-72: WIP dedented triple-quoted string literals Oct 15, 2025
@odersky
Copy link
Contributor

odersky commented Oct 15, 2025

That was fast!

@Gedochao Gedochao added needs-minor-release This PR cannot be merged until the next minor release needs-sip A SIP needs to be raised to move this issue/PR along. stat:sip-in-progress and removed needs-sip A SIP needs to be raised to move this issue/PR along. labels Oct 15, 2025
@lihaoyi
Copy link
Contributor Author

lihaoyi commented Oct 15, 2025

Not ready to review yet! Still need a bit more vibing haha

@Gedochao
Copy link
Contributor

Not ready to review yet! Still need a bit more vibing haha

Ah right, I'll convert it to draft then

@Gedochao Gedochao marked this pull request as draft October 15, 2025 08:49

val hasTabs = closingIndent.contains('\t')
val hasSpaces = closingIndent.contains(' ')
if (hasTabs && hasSpaces) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to detect this in one loop

else
literal(inTypeOrSingleton = true)

/** Dedent a string literal by removing common leading whitespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new code in the compiler we use indentation syntax and new conditional if / then / else syntax. The old Java conditional syntax is already disabled under -language.future.

Comment on lines 1555 to 1566
val isDedented =
in.charOffset + 2 < in.buf.length &&
in.buf(in.charOffset - 1) == '\'' &&
in.buf(in.charOffset) == '\'' &&
in.buf(in.charOffset + 1) == '\''
in.nextToken()
def nextSegment(literalOffset: Offset) =
segmentBuf += Thicket(
literal(literalOffset, inPattern = inPattern, inStringInterpolation = true),
atSpan(in.offset) {
if (in.token == IDENTIFIER)
termIdent()
else if (in.token == USCORE && inPattern) {
in.nextToken()
Ident(nme.WILDCARD)
}
else if (in.token == THIS) {
in.nextToken()
This(EmptyTypeIdent)
}
else if (in.token == LBRACE)
if (inPattern) Block(Nil, inBraces(pattern()))
else expr()
else {
report.error(InterpolatedStringError(), source.atSpan(Span(in.offset)))
EmptyTree
}
})

var offsetCorrection = if isTripleQuoted then 3 else 1
while (in.token == STRINGPART)
nextSegment(in.offset + offsetCorrection)
// Collect all string parts and their offsets
val stringParts = new ListBuffer[(String, Offset)]
val interpolatedExprs = new ListBuffer[Tree]

var offsetCorrection = if (isDedented) 3 else if (isTripleQuoted) 3 else 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is super sketchy, I'm sure there's a better way

@lihaoyi lihaoyi marked this pull request as ready for review October 25, 2025 00:43
@lihaoyi
Copy link
Contributor Author

lihaoyi commented Oct 25, 2025

Marking this as ready to review since the SIP is has been voted into experimental phase

@lihaoyi lihaoyi changed the title SIP-72: WIP dedented triple-quoted string literals SIP-72: dedented triple-quoted string literals Oct 25, 2025
@lihaoyi
Copy link
Contributor Author

lihaoyi commented Oct 25, 2025

@odersky @sjrd looks like most of the tests are green, this should be ready for review. I couldn't find any error message in the failing job's logs, might need help from someone more familiar with the CI setup to help take a look

@lihaoyi
Copy link
Contributor Author

lihaoyi commented Oct 25, 2025

The last failure turned out to be due to error message positioning issues, fixed that and it's all green now

@Gedochao Gedochao requested a review from odersky October 27, 2025 08:00
@Gedochao Gedochao assigned sjrd and odersky and unassigned odersky Oct 27, 2025
@Gedochao Gedochao requested a review from noti0na1 October 27, 2025 08:26

val isDedented =
in.charOffset + 2 < in.buf.length &&
in.buf(in.charOffset - 1) == '\'' &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we look at offset - 1 here? And we should be able to reuse the logic of isDedentedStringLiteral?

*
* @param str The string content to dedent
* @param offset The source offset where the string literal begins
* @return The dedented string, or str if errors were reported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the importance of other parameters, it would be better to explain them as well, maybe with an example.

}

/** Extract the closing indentation from the last line of a string */
private def extractClosingIndent(str: String, offset: Offset): (String, Boolean) = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use an Option[String] for the result.

val linesAndWithSeps = (str.linesIterator.zip(str.linesWithSeparators)).toSeq
var lineOffset = offset
// start counting error location offsets only after opening delimiter
while(in.buf(lineOffset) == '\'') lineOffset += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a space after while

@@ -0,0 +1,288 @@
// Test runtime behavior of dedented string literals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the following small tests:

val x = s'''
${"content with\nnewline"}
more text
'''

val nested = s'''
outer ${'''
  inner
'''}
'''

val nested2 = s'''
outer ${s'''
  inner with $x
'''}
'''

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-minor-release This PR cannot be merged until the next minor release stat:sip-in-progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants